Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number
نویسندگان
چکیده
In this paper we explore the variation of sentences as a function of the sentence number. We demonstrate that while the entropy of the sentence increases with the sentence number, it decreases at the paragraph boundaries in accordance with the Entropy Rate Constancy principle (introduced in related work). We also demonstrate that the principle holds for different genres and languages and explore the role of genre informativeness. We investigate potential causes of entropy variation by looking at the tree depth, the branching factor, the size of constituents, and the occurrence of gapping.
منابع مشابه
Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches
Sentence compression is a task of creating a short grammatical sentence by removing extraneous words or phrases from an original sentence while preserving its meaning. Existing methods learn statistics on trimming context-free grammar (CFG) rules. However, these methods sometimes eliminate the original meaning by incorrectly removing important parts of sentences, because trimming probabilities ...
متن کاملGoing beyond sentences when applying tree kernels
We go beyond the level of individual sentences applying parse tree kernels to paragraphs. We build a set of extended trees for a paragraph of text from the individual parse trees for sentences and learn short texts such as search results and social profile postings to take advantage of additional discourse-related information. Extension is based on coreferences and rhetoric structure relations ...
متن کاملsurvey and analysis of purposes of compositional sentences in Asra sureh
Abstract: The men of eloquence divide the word in respect of capebility of truth and mendacity to report and composition.on the conterary of compositional sentence ,report sentence is not truthful and mendacious. It consists five types of imperative,interdictional,interrogational,supplicational and vocative.some compositional sentences have secondary purpose.Asra,the seventeenth sureh of Holy ...
متن کاملSample Selection for Statistical Grammar Induction
Corpus-based grz.mmar induction relies on using many hand-parsed sentences as training examples. However, the construction of a training corpus with detailed syntactic analysis for every sentence is a labor-intensive task. We propose to use sample selection methods to minimize the amount of annotation needed in the training data, thereby reducing the workload of the human annotators. This paper...
متن کاملTriplet Extraction from Sentences
In this paper we present an approach to extracting subject-predicate-object triplets from English sentences. To begin with, four different well known syntactical parsers for English are used for generating parse trees from the sentences, followed by extraction of triplets from the parse trees using parser dependent techniques.
متن کامل